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Error-Detection  Codes: 
Algorithms  and  Fast  Implementation 

Gam  D.  Nguyen 


Abstract — Binary  CRCs  are  very  effective  for  error  detection,  but  their  software  implementation  is  not  very  efficient.  Thus,  many  binary 
non-CRC  codes  (which  are  not  as  strong  as  CRCs,  but  can  be  more  efficiently  implemented  in  software)  are  proposed  as  alternatives 
to  CRCs.  The  non-CRC  codes  include  WSC,  CXOR,  one’s-complement  checksum,  Fletcher  checksum,  and  block-parity  code.  In  this 
paper,  we  present  a  general  algorithm  for  constructing  a  family  of  binary  error-detection  codes.  This  family  is  large  because  it  contains 
all  these  non-CRC  codes,  CRCs,  perfect  codes,  as  well  as  other  linear  and  nonlinear  codes.  In  addition  to  unifying  these  apparently 
disparate  codes,  our  algorithm  also  generates  some  non-CRC  codes  that  have  minimum  distance  4  (like  CRCs)  and  efficient  software 
implementation. 

Index  Terms — Fast  error-detection  code,  Flamming  code,  CRC,  checksum. 
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1  Introduction 

fficient  implementation  of  reliable  error-protection 
algorithms  plays  a  vital  role  in  digital  communication 
and  storage  because  channel  noise  and  system  malfunc¬ 
tion  introduce  errors  in  received  and  retrieved  messages. 
Here,  we  focus  on  binary  error-detection  codes  that  have 
low  overhead  and  minimum  distance  d  <  4.  Popular 
error-detection  codes  used  in  practice  include  CRCs  that 
are  generated  by  binary  polynomials  such  as  A'16  +  A'15  + 
X2  +  l  (called  CRC-16)  and  X16  +  X12  +  X5  +  1  (called 
CRC-CCITT). 

An  h- bit  CRC  generated  by  G(X)  =  ( X  +  1  )P(X),  where 
P(X)  is  a  primitive  polynomial  of  degree  h  —  1,  has  the 
following  desirable  properties  [1],  The  CRC  has  maximum 
codeword  length  of  2h~ 1  —  1  bits  and  minimum  distance 
d  =  4,  i.e.,  all  double  and  odd  errors  are  detected.  This  code 
also  detects  any  error  burst  of  length  h  bits  or  less,  i.e.,  its 
burst-error-detecting  capability  is  b  =  h.  The  guaranteed 
error-detection  capability  of  this  /7-bit  CRC  is  nearly  optimal 
because  its  maximum  codeword  length  almost  meets  the 
upper  bound  2h~ 1  and  its  burst-error-detecting  capability 
meets  the  upper  bound  h.  The  CRC  is  also  efficiently 
implemented  by  special-purpose  shift-register  hardware. 

Although  CRCs  have  nearly  optimal  properties  and 
efficient  hardware  implementation,  many  binary  non-CRC 
codes  are  proposed  as  alternatives  to  CRCs.  These  codes, 
developed  over  many  years  and  often  considered  as 
unrelated  to  each  other,  do  not  have  the  CRC's  desirable 
properties.  Such  non-CRC  codes  include  weighted  sum 
code  (WSC),  Fletcher  checksum  (used  in  ISO),  one's- 
complement  checksum  (used  in  Internet),  circular-shift 
and  exclusive-OR  checksum  (CXOR),  and  block-parity  code 
(Fig.  1).  See  [4],  [5],  [9],  [14]  for  implementation  and 
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performance  comparisons  of  CRCs  and  these  non-CRC 
codes.  Perhaps  the  key  reason  for  the  appearance  of  the 
non-CRC  codes  is  that  CRCs  are  not  very  efficiently 
implemented  in  software.  Software  complexity  refers  to 
the  number  of  programming  operations  and  hardware 
complexity  refers  to  the  number  of  gates  required  for  code 
implementation.  Investigations  reported  in  [4],  [9]  indicate 
that  software  processing  of  CRCs  is  slower  than  that  of  the 
non-CRC  codes.  Thus,  it  is  desirable  to  design  error- 
detection  codes  that  are  reliable  and  of  low  complexity. 
One  code  is  better  than  another  if,  for  a  fixed  number  of 
check  bits  h,  it  has  larger  minimum  distance  d,  larger  burst- 
error-detecting  capability  b,  longer  maximum  codeword 
length  Imaxt  and  lower  complexity. 

An  important  performance  measure  of  a  code,  which  is 
not  addressed  in  this  paper,  is  its  probability  of  undetected 
error.  For  the  binary  symmetric  channel,  this  probability  can 
be  expressed  in  terms  of  the  weight  distribution  of  the  code. 
In  general,  the  problem  of  computing  the  probability  of 
undetected  error  is  NP-hard  [7],  Some  methods  for 
calculating  or  estimating  this  probability  are  given  in  [7], 

Because  the  minimum  distance  d  is  often  considered  the 
most  important  parameter.  Fig.  1  ranks  CRC  as  the  best 
code,  WSC  the  second  best,  and  so  on.  Although  the  WSC, 
Fletcher  checksum,  and  CXOR  are  defined  only  for  an  even 
number  of  check  bits  h,  both  even  and  odd  h  can  be  used  for 
the  other  codes.  The  CRC,  WSC,  and  Fletcher  checksum  can 
be  extended  to  have  infinite  length,  but  their  minimum 
distances  all  reduce  to  2.  Some  discussions  of  burst-error- 
detecting  capability  b  are  given  in  Appendix  C  (which  can 
be  found  on  the  Computer  Society  Digital  Library  at  http: /  / 
computer. org/tc/archives.htm).  In  this  paper,  we  focus  on 
code  implementation  by  means  of  software.  Because 
computers  can  process  information  in  blocks  of  bits  (e.g., 
bytes  or  words),  codes  having  efficient  software  implemen¬ 
tation  should  also  be  processed  in  blocks  of  bits.  Then,  it  is 
natural  to  express  code  lengths  in  terms  of  the  number  of 
blocks  n  and  each  block  is  s  bits,  i.e.,  the  total  number  of  bits 
is  ns.  Most  modern  processors  can  efficiently  handle  block 
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Fig.  1.  Error-detection  capabilities  of  binary  codes  (d  =  minimum 
distance,  b  =  burst-error-detecting  capability,  h  =  number  of  check 
bits,  lmax  =  maximum  codeword  length). 

size  s  =  8, 16,32,64  bits.  General-purpose  computers  and 
compilers  are  increasingly  faster  and  better.  Thus,  software 
algorithms  become  more  relevant  and  desirable.  Software 
algorithms  are  increasingly  used  in  operations,  modeling, 
simulations,  and  performance  analysis  of  systems  and 
networks.  An  important  advantage  of  software  implemen¬ 
tation  is  its  flexibility:  It  is  much  simpler  to  modify  a 
software  program  than  to  modify  a  chip  full  of  hardwired 
gates  and  buses. 

In  this  paper,  we  present  a  general  algorithm  and  its 
systematic  versions  for  constructing  a  large  family  of  binary 
error-detection  codes  (Section  2).  This  family  contains  all  the 
codes  in  Fig.  1  and  other  linear  and  nonlinear  codes  for  error 
detection.  We  unify  the  treatment  of  these  seemingly 
unrelated  codes  by  showing  that  CRCs  and  the  non-CRC 
codes  all  come  from  a  single  algorithm  (Section  3).  Further, 
the  algorithm  can  produce  some  non-CRC  codes  that  are  not 
only  reliable  (i.e.,  having  minimum  distance  4  as  CRCs),  but 
also  have  fast  software  implementation  (Section  4).  We  then 
summarize  and  conclude  the  paper  (Section  5).  The  paper  is 
supplemented  with  appendices  (which  can  be  found  on  the 
Computer  Society  Digital  Library  at  http:/ /computer.org/ 
tc/ archives.htm)  that  include  theorem  proofs,  code  segments 
implemented  in  C  programming  language,  as  well  as 
discussions  of  CRCs,  WSCs,  and  CXORs.  The  preliminary 
version  of  this  paper  is  presented  in  [10]. 

1.1  Notations  and  Conventions 

We  consider  polynomials  over  only  the  binary  field  GF(2), 

i.e.,  the  polynomial  operations  are  performed  in  polynomial 
arithmetic  modulo  2.  Let  A  =  A(X)  and  B  =  B(X )  be  two 
polynomials,  then  A  mod  B  is  the  remainder  polynomial 
that  is  obtained  when  A  is  divided  by  B  with 
deg(AmodS)  <  deg(B).  To  ease  the  presentation  of  many 
different  codes  (which  can  result  in  a  large  number  of 
parameters),  we  adopt  the  following  sweeping  conventions. 
A  /-tuple  (do, .  •  • ,  ctj-2,  1)  denotes  the  binary  polynomial 
doX'A1  +  . . .  +  aj-iX  +  dj_i  of  degree  less  than  j.  In  this 
paper,  lowercase  letters  (such  as  h  and  do)  denote 
nonnegative  integers.  The  letters  C  and  C\  denote  codes, 
other  uppercase  letters  (such  as  A  and  Qi)  denote 
polynomials  (or  tuples),  and  X  denotes  the  variable  (or 
indeterminate)  of  these  polynomials.  Further,  the  variable  X 
will  be  omitted  from  all  polynomials,  i.e.,  A(X)  will  be 
denoted  as  A.  We  denote  ul  as  the  /-tuple  whose 
components  are  all  us,  me{0,1}.  The  notation  ( l,k,d ) 


denotes  a  systematic  code  with  l  =  code  length,  k  = 
information  block  length,  and  d  =  minimum  distance. 
Finally,  if  Y\  and  !/>  are  mi-tuple  and  m2-tuple,  respectively, 
then  Y  =  (L) ,  YQ  denotes  the  concatenation  of  Yi  to  Ylr  i.e., 

Y  is  an  ( m\  +  W2)-tuple.  Note  that  Y  can  also  be  written  as 

Y  =  YiX™2  +  Y>.  For  ease  of  cross-referencing,  we  usually 
label  blocks  of  text  as  "Remarks."  These  remarks  are 
integral  parts  of  our  presentation  and  they  should  not  be 
viewed  as  isolated  observations  or  comments. 

2  A  General  Algorithm  for  Error-Detection 
Codes 

In  this  section,  we  define  a  binary  code  so  that  each  of  its 
codewords  consists  of  n  tuples  Q0,  Q i, . . . ,  Qn-i,  each  tuple 
is  s  bits.  This  code  is  not  necessarily  systematic  and  is 
formulated  abstractly  to  facilitate  the  development  of  its 
mathematical  properties.  For  practical  use,  we  then  con¬ 
struct  systematic  versions  of  the  code.  Fast  software 
versions  of  the  code  will  be  presented  later  in  Section  4.  It 
is  important  to  note  that  Qi  is  an  uppercase  letter,  so,  by  our 
conventions,  Qi  is  a  polynomial  of  the  variable  X,  i.e., 
Qi  =  Qi(X).  Further,  being  an  s-tuple,  Qi  is  also  a 
polynomial  of  degree  less  than  s.  The  polynomial  notation 
facilitates  the  mathematical  developments  of  codes.  The 
tuple  notation  is  more  appropriate  for  software  implemen¬ 
tation  of  codes  because  an  s-tuple  is  a  group  of  s  bits,  which 
can  be  easily  processed  by  computers.  Note  also  that  the 
Ms-tuple  (Qo,Qu  -  ■  ■  ,Qn-2,Qn-i)  is  equivalent  to  the  poly¬ 
nomial  XIS)1  QiX^n~1~lls  of  degree  less  than  ns. 

First,  let  C\  be  a  binary  code  with  length  s  and  minimum 
distance  d\ .  Let  r  and  n  be  integers  such  that  1  <  n  <  2r.  Let 
W0,  I'Li  i  •  ■  ■  •  Wn- 1  be  distinct  polynomials  of  degree  less  than 
r.  Let  M  be  a  polynomial  of  degree  r  such  that  M  and  X  are 
relatively  prime,  i.e.,  gcd(Af,  X)  =  1.  Also,  Qi  is  an  s-tuple, 
/  >  0.  Now,  we  are  ready  to  introduce  a  new  code  that  is 
simply  called  "the  code  C"  and  is  the  focus  of  this  paper. 

Algorithm  1.  Let  C  be  the  binary  code  such  that  each  of  its 
codewords 

Qli  •  ■  •  i  Qn— 2;  Qn—  1 )  (1) 

satisfies  the  following  two  conditions: 

n—l  \ 

QiWi  m0d  M  =  0  (2) 

i=0  / 

n—l 

(3) 

i—0 

Remark  1. 

1.  C  is  nonlinear  if  C\  is  nonlinear. 

2.  From  (3),  the  codewords  of  C  have  even  weights 
if  the  codewords  of  C\  have  even  weights. 

3.  The  code  C\  in  Algorithm  1  can  be  nonsystematic. 
Flowever,  we  focus  only  on  systematic  codes, 
which  are  more  often  used  in  practice.  Thus,  we 
assume  that  C\  is  an  (s,  s  —  m,  d\)  systematic  code 
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with  m  check  bits,  0  <  m  <  s.  Let  F  be  the 
encoder  of  C\.  Then,  each  codeword  of  C\  is 
UXm  +  F(U)  =  (U,  F(JJ )),  where  U  is  an  informa¬ 
tion  (s  —  m)-tuple  and  F(JJ)  is  the  corresponding 
check  m-tuple. 

4.  In  Algorithm  1,  the  weights  Wo .  W\ , . . . .  Wn-\  can 
be  chosen  to  be  distinct  polynomials  of  degree 
less  than  r  because  1  <n  <  2r.  However,  Algo¬ 
rithm  1  can  be  extended  to  allow  n  >  2r,  then 
W0,  Wi, . . . ,  W„_i  will  not  always  be  distinct  (see 
Section  3  later). 

5.  All  the  codes  considered  in  this  paper  are  binary 
codes,  i.e.,  their  codewords  consist  of  digits  0  or  1. 
In  particular,  the  code  C  is  a  binary  code  whose 
codewords  are  ns  bits.  Computers  can  efficiently 
process  groups  of  bits.  Thus,  as  seen  in  (1),  each 
«s-bit  codeword  is  grouped  into  n  tuples,  s  bits 
each.  Note  that  this  binary  code  C  can  also  be 
viewed  as  a  code  in  GF(2S),  i.e.,  as  a  code  whose 
codewords  consist  of  n  symbols,  each  symbol 
belongs  to  GF(2S).  More  generally,  suppose  that 
ns  =  xy  for  some  positive  integers  x  and  y,  then 
this  same  binary  code  C  can  also  be  viewed  as  a 
code  whose  codewords  consist  of  x  symbols,  each 
symbol  belongs  to  GF(2y).  In  the  extreme  case 
(x  =  1,  y  =  ns),  the  code  C  is  also  a  code  whose 
codewords  consist  of  only  one  symbol  that 
belongs  to  GF(2"S).  Note  that,  when  the  same 
code  is  viewed  in  different  alphabets,  their 
respective  minimum  distances  can  be  very  differ¬ 
ent.  For  example,  consider  the  binary  repetition 
code  {0*,  lk}  of  length  k  >  1.  When  viewed  as  the 
binary  code  over  GF(2),  this  code  has  minimum 
distance  d=  k.  But,  when  viewed  in  GF(2,;),  this 
same  code  has  minimum  distance  d=  1. 

Let  d\  and  dc  be  the  minimum  distances  of  the  binary 
codes  Ci  and  C,  respectively,  in  Algorithm  1.  We  then  have 
the  following  theorem  that  is  proven  in  Appendix  A  (which 
can  be  found  on  the  Computer  Society  Digital  Library  at 
http:/  /  computer.org/ tc/ archives.htm). 

Theorem  1. 

1 .  dc  >  3  if  d\  >  3. 

2.  dc  =  4  if  di  >  4. 

Example  1.  Now,  we  illustrate  Algorithm  1  by  construct¬ 
ing  a  simple  binary  code  C.  Let  s  =  4,  m  =  3,  r  =  2, 
and  n  =  2r  =  4.  Thus,  each  codeword  of  the  code  C 
is  a  16-tuple  (Qo,  Qi,  Q2,  Q3),  where  each  Qi  is  a 
4-tuple.  Let  M  =  X2  +  X  +  1  be  the  modulating 
(primitive)  polynomial.  Let  the  weighting  polynomials 
in  (2)  be  W0  =  X  +1,  IF,  =  X,  W2  =  1,  and  W3  =  0. 
Let  Ci  =  {(0,0, 0,0),  (1,1, 1,1)},  i.e.,  Ci  is  the  (4,1,4) 
repetition  code.  Now,  we  wish  to  specify  the  desired 
codeword  (Q0,  Q1:  Q2,  Qz)-  Let  Q0  and  Q,  be  two 
arbitrary  4-tuples.  Then,  Q2  and  Q3  are  determined  as 
follows:  Let  U±  and  U2  be  arbitrary  2-tuple  and  1-tuple, 
respectively.  Then,  we  define  Q2  =  (U\ ,  P\ )  and 
Q 3  =  (U2,  P2),  where  Pi  and  P2  are  determined  as  follows: 
First,  compute  the  check  2-tuple 
Pi  =  (QoWft  +  QiWi  +  UiX2)  mod  M.  Next,  define 


Y  =  Q0  +  Qi  +  (UiX2  +Pi)  +  U2X3 
=  Qo  +  Qi  +  Q2  +  U2X3 , 

which  is  a  4-tuple.  Thus,  Y  can  be  written  as 
Y  =  Y]  X3  +  Y2  =  (YuY2),  where  Y\  is  a  1 -tuple  and  Y2 
is  a  3-tuple.  Finally,  we  compute  P-2  =  Y2  +  (Yi,Yi,Yi), 
which  is  a  3-tuple. 

Now,  we  will  show  that  the  codeword 
(Qo?  Q i '  Q‘2-  Q:i)  =  (Qo,Qi,UuPi,U2,P2)  satisfies  (2)  and 
(3)  in  Algorithm  1.  Since 
Pi  =  ( Q0W0  +  QiWi  +  UiX'2)  mod  M,  we  have 

0  =  (Q0Wo  +  QiWi  +  UiX2  +  Pi)  mod  M. 

Then,  0  =  (Q0W0  +  QXWX  +  Q2W2  +  Q3W3)  mod  M  be¬ 
cause  Q2W2  =  UiX2  +  Pi  and  Q3W3  =  0.  Thus, 
(Qo,Qi,Q2,Q3)  satisfies  (2).  Next, 

Qo  +  Qi  +  Q-2  +  Qs  =  Y  +  U2X3  +  Q3 

(because  Y  =  Q0  +  Qx  +  Q2  +  U2X3) 

=  Y  +  U2X3  +  (U2,  P2) 

=  Y+P2  [because  ( U2,P2)  =  U2X3  +  P2] 
=  (Yi,Y2)  +  Y2  +  (Yi,YuYi) 

=  (Yi,Yi,YuYi)  e  Ci. 

Thus,  (Q0,QuQ2,Q3)  =  (Qo,Qi,Ui,Pi,U2,P2)  also  satis¬ 
fies  (3).  By  exchanging  Px  and  U2/  the  codeword  becomes 
(Qo,Qi,Ui,U2,Pi,P2),  which  is  a  codeword  of  a  sys¬ 
tematic  code  because  (Qo,  Qi,  Ui,  U2)  are  the  11  informa¬ 
tion  bits  and  (Px,  P2)  are  the  5  corresponding  check  bits. 
Because  d\  =  4,  dc  =  4  by  Theorem  1.2.  Thus,  C  is 
identical  to  the  (16, 11,4)  extended  Hamming  code. 

2.1  Systematic  Encoding 

In  general,  the  binary  code  C  in  Algorithm  1  is  not 
systematic.  Now,  we  construct  its  systematic  versions. 
Recall  that  r  is  the  degree  of  the  modulating  polynomial  M 
and  s  is  the  number  of  bits  contained  in  each  tuple  Q, .  Let 
r  <  s  and  suppose  that  information  tuples 

(Q0,Qi,...,Qn-3,Lri,t/2)  (4) 

are  given,  where  Ux  is  an  (s  —  r)-tuple  and  U2  is  an 
(s  —  m)-tuple.  We  wish  to  append  a  check  r-tuple  P3  and 
a  check  w;-tuple  P2  to  (4)  so  that  the  resulting  codeword  is 

(Qo,  Ql,  •  •  •  ,  Qn— 3 ,  Ul,  U2,  Pi,  P2).  (5) 

Thus,  the  code  C  is  ns  bits  long  and  has  h  =  r  +  m  check 
bits.  Denote  dc  as  its  minimum  distance,  then  C  is  an 
(ns,  ns  —  r  —  m,  dc)  code.  Then,  we  have  the  following 
algorithm  that  is  proven  in  Appendix  A  (which  can  be 
found  on  the  Computer  Society  Digital  Library  at  http:// 
computer.org/ tc /archives.htm). 

Algorithm  la.  When  r  <  s,  the  two  check  tuples  of  a 
systematic  version  of  the  binary  code  C  can  be  computed  by 

n— 3  \ 

^  QiWi  +  UiXT  mod  M  (6) 

i= 0  / 

P2  =  Y2  +  F(Yi),  (7) 
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where  W,  ^  0,1  and  F  is  the  encoder  of  C\  as  defined  in 
Remark  1.3.  The  tuples  Yt  and  Y2  are  determined  as 
follows:  Let 


n— 3 

y  =  Q>  +  Uixr  +  p1  +  u2xm, 

i= 0 

which  is  an  s-tuple  that  can  be  written  as 
Y  =  Y\  X'm  +  Y2  =  (Y\ .  Y2),  where  Y\  and  Y2  are  an  (s  — 
m)-tuple  and  an  777-tuple,  respectively. 

Remark  2.  After  Pi  is  computed,  P2  is  easily  computed 
when  Ci  is  one  of  the  following  four  types  of  codes:  The 
first  two  types  of  codes,  given  in  1  and  2  below,  are  very 
trivial,  but  they  are  used  later  in  Section  3  to  construct  all 
the  codes  in  Fig.  1.  The  next  two  types  of  codes,  given  in 
3  and  4  below,  are  commonly  used  in  practice  for  error 
control. 

1.  If  m  =  s,  then  Ci  =  {0s},  which  is  an  (s,0,di) 
code,  where  the  minimum  distance  d\  is  unde¬ 
fined.  This  very  trivial  code  is  called  a  useless 
code  because  it  carries  no  useful  information. 
However,  it  can  detect  any  number  of  errors,  i.e., 
we  can  assign  d\  =  oo  for  this  particular  code. 
Further,  it  can  be  shown  that  Theorem  1.2 
remains  valid  when  m  =  s,  i.e.,  dc  =  4  if 
Ci  =  {0s}.  Then,  from  Algorithm  la,  we  have 
U2  =  0,  F  =  0s,  Y]  =  0,  and 

n— 3 

p2  =  y2  =  y  =  J2  Qi  +  UiXr  +  Pi. 

i= 0 

2.  If  77i  =  0,  then  Ci  =  (0, 1}S,  which  is  an  (s,s,  1) 
code.  This  very  trivial  code  is  called  a  powerless 
code  because  it  protects  no  information.  From 
Algorithm  la,  we  have  Y2  =  0,  F  =  0, 

n— 3 

Yi=Y  =  Yj  Qi  +  UiXr  +  Pi  +  u2, 

7=0 
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Fig.  2.  Construction  of  the  codes  C  using16  check  bits. 

For  example,  let  s  =  8,  then  a  =  1  and  6  =  4  if  r  =  12, 
whereas  a  =  1  and  6  =  8  if  r  =  16.  Thus,  Pi  can  be 
stored  in  a  + 1  tuples:  The  first  tuple  is  6  bits  and 
each  of  the  next  a  tuples  is  s  bits.  Now,  assume  that 
information  tuples  (Q0,  Qi, . . . ,  Qn-a-3,  Uu  U2)  are  given, 
where  each  Qi  is  s  bits,  U\  is  s  —  6  bits,  and  U2  is  s  —  m  bits. 
We  assume  here  that  n  —  a  —  3  >  0  or  n>  a  +  3,  to  avoid 
triviality.  We  wish  to  append  two  checks  tuples  Pi  and  P2  to 
(Qo,  Q 1;  •  •  •  1  Qn-a-3 i  U\,  U2)  SO  that 

(Qo  ,  Ql  i  ■  •  •  ,  Qn—a— 3 ;  Ci,  U2,  ,  P2) 

becomes  a  codeword  of  a  systematic  (ns,  ns  —  r  —  m,  dc) 
code.  Then,  we  have  the  following  algorithm  that  is  proven  in 
Appendix  A  (which  can  be  found  on  the  Computer  Society 
Digital  Library  at  http:/ /computer.org/ tc/ archives.htm). 

Algorithm  lb.  When  r  >  s,  the  two  check  tuples  of  a 
systematic  version  of  the  binary  code  C  can  be  computed  by 

(n—a— 3  \ 

QiWi  +  UiXrJ  mod  M  and  P2  =  Y2  +  P(W), 

where  P  is  the  encoder  of  C\  and 

Wl^Xas,X{a~l)s,...,Xs,  1,0. 


and  P2  =  0. 

3.  If  Ci  is  a  systematic  linear  code  with  parity  check 
matrix  Hi  =  [AI],  where  A  is  an  m  x  (s  —  m) 
matrix  and  I  is  the  m  x  m  identity  matrix,  then 
F(U)  =  f/Atr,  where  "tr"  denotes  matrix  trans¬ 
pose.  Thus,  P2  =  Y2  +  F(Y  )  =  Y2  +  HiAtr  =  L'Hf . 

4.  If  Ci  is  a  CRC  generated  by  a  polynomial  M\  of 
degree  m,  then  F(U)  =  (UXm)  mod  Mi  (see  Ap¬ 
pendix  B,  which  can  be  found  on  the  Computer 
Society  Digital  Library  at  http://computer.org/ 
tc/archives.htm).  Thus, 

p2  =  Y2  +  (YiXm)  mod  Mi  =  (Y1Xm  +  Y2)  mod  Mx 
=  Y  mod  Mi . 

Algorithm  la  is  for  the  case  r  <  s,  where  the  check 
r-tuple  Pi  can  be  stored  in  a  single  s-tuple.  Now,  we 
consider  the  case  r  >  s.  Then,  several  s-tuples  are 
needed  to  store  the  check  r-tuple  Pi.  Because  r  >  s, 
we  can  write  r  =  as  +  b,  where  a  >  1  and  0  <  6  <  s. 


The  tuples  Y\  and  Y2  are  determined  as  follows:  Define 

n—a— 3  \  /  a  \ 

Qi  I  +  (U\Xb  +  Pio)  +  I  P|,  I  +  U2Xm, 

i=0  /  \i=l  ) 

where  Pm  is  a  6-tuple  and  Pn,...,Pia  are  s-tuples  that 
satisfy  Pi  =  (Pio,  P\\, . . . ,  Pia)-  Then,  Y  is  an  s-tuple  that  can 
be  written  as  Y  =  Y2  Xm  +  Y2  =  (H ,  Y2),  where  Y\  and  Y2  are 
an  (s  —  77i)-tuple  and  an  777-tuple,  respectively. 

Example  2.  Recall  that  C  is  an  (ns,  ns  —  r  —  m,  dc)  code  that 
is  constructed  by  either  Algorithm  la  (if  r  <  s)  or 
Algorithm  lb  (if  r  >  s).  This  code  has  h  =  r  +  m  check 
bits.  In  this  example,  we  assume  that  h  =  16  bits  and  we 
present  different  ways  to  construct  the  codes  C.  The 
results  are  shown  in  Fig.  2.  For  example,  using 
Algorithm  lb,  we  can  construct  the  code  C  with  the 
following  parameters:  s  =  8,  r  =  12,  m  =  4,  C\  =  (8, 4, 4) 
code,  a  =  1,  and  6  =  4  (a  and  6  are  not  needed  in 
Algorithm  la).  Assume  that  the  number  of  s-tuples 
satisfies  n  <  2r,  i.e.,  the  number  of  bits  in  each  codeword 
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is  ns  <  2 rs  =  215.  Then,  the  weighting  polynomials  W, 
can  be  chosen  to  be  distinct.  From  Remark  2.1,  we  have 
di  >  4.  Then,  from  Theorem  1.2,  all  the  codes  C  in  Fig.  2 
have  minimum  distance  dc  =  4. 

3  Some  Special  Error-Detection  Codes 

This  section  shows  that  the  binary  code  C  of  Algorithm  1  is 
general  in  the  sense  that  it  includes  all  the  codes  in  Fig.  1  and 
other  codes  as  special  cases.  Recall  that  Algorithm  l's 
systematic  version  is  either  Algorithm  la  (if  r  <  s)  or 
Algorithm  lb  (if  r  >  s),  where  r  is  the  degree  of  the 
modulating  polynomial  M  and  s  is  the  number  of  bits 
contained  in  each  tuple  Qi.  The  code  C  depends  on  the 
components  such  as  the  parameters  r,  m,  s,  n,  the  weights 
Wo,  Wi, . . . ,  Wn- 1,  and  the  code  C\.  Thus,  different  compo¬ 
nents  will  produce  different  codes  C.  We  now  show  that 
Algorithm  1  produces  the  codes  in  Fig.  1  by  letting  C\  be 
trivial  codes  such  as  ( s ,  s,  l)and  {0s}  defined  in  Remark  2.  The 
algorithm  also  produces  other  linear  and  nonlinear  codes 
(Sections  3.1, 3.6,  and  3.8).  Generally,  codes  of  dc  =  4  require 
that  n  <  2r  and  the  weights  Wo,  Wi, . . . ,  Wn-\  in  Algorithm  1 
be  distinct.  Codes  of  dc  =  3  require  that  n  <  2r  +  1  and  allow 
some  of  the  weights  to  be  repeated.  Codes  of  dc  =  2  also  allow 
some  of  the  weights  to  be  repeated,  but  do  not  restrict  on  the 
value  of  n,  i.e.,  the  code  lengths  can  be  arbitrary.  The 
following  codes  are  briefly  presented  because  their  detailed 
discussions  can  be  found  elsewhere  [4],  [5],  [9],  [14]. 

3.1  Binary  Extended  Perfect  Code 

We  now  show  that  Algorithm  1  produces  an  extended 
perfect  code  if  the  code  C\  is  an  extended  perfect  code. 
Suppose  that  C\  is  a  (2m_1,2m_1  —  to,  4)  extended  perfect 
code  (see  [8,  Chapter  6]),  i.e.,  s  =  2m~1  and  d\  =  4.  Let  n  =  2r 
and  h  =  r  +  m,  then  the  code  C  has  ns  =  2r+m~1  =  2ft_1  bits. 
Then,  dc  =  4  by  Theorem  1.2  and  C  is  a  ( 2h~1,2h~ 1  —  h,  4) 
extended  perfect  code.  Note  that  deleting  a  check  bit  from 
an  extended  perfect  code  will  yield  a  perfect  code,  while 
adding  an  overall  even  parity  bit  to  a  perfect  code  will  yield 
an  extended  perfect  code. 

Algorithms  la  and  lb  can  be  further  generalized  to 
include  the  extended  perfect  code  of  [15]  as  follows:  Recall 
that  Pi,  Pi,  and  Y\  are  the  check  r-tuple,  check  m-tuple,  and 
(s  —  m)-tuple,  respectively,  which  are  computed  from 
Algorithms  la  or  lb.  Let  E(.)  be  any  function  from  the  set 
of  ( s  —  m)-tuples  to  the  set  of  r-tuples.  Now,  define  the  new 
check  r-tuple  and  check  m-tuple  by 

P}  =  Pi  +  E(Yi)  and  P2*  =  P2  +  even  parity  of  E(Yi). 

Then,  it  can  be  shown  that,  if  Ci  is  an  extended  perfect  code 
and  n  =  2r,  then  the  resulting  code  C  whose  check  tuples 
are  P*  and  P2*  is  also  an  extended  perfect  code.  Further, 
when  r  =  1,  this  extended  perfect  code  becomes  the 
extended  perfect  code  that  is  obtained  from  the  systematic 
perfect  code  of  [15]. 

3.2  Weighted  Sum  Code  (WSC) 

Consider  the  code  C  for  the  special  case  s  =  r  =  to.  By 
Remark  2.1,  we  have  Ci  =  {0s},  Ui  =  0,  U2  =  0,  Yt  =  0,  and 
Y}  =  Y  =  Y11=o  Qi  +  Pi-  From  (6)  and  (7)  of  Algorithm  la, 
we  have 


n— 3  n— 3 

Pi  =  Y1  Q'w‘ mod  M  and  p2  =  Y,Q>  +  Pi-  (8) 

i= 0  i— 0 

Thus,  this  special  code  C  is  the  WSC  presented  in  [4],  [9].  It 
is  shown  in  [3]  that  the  WSC,  when  viewed  as  a  code  in 
GF(2S),  is  equivalent  to  a  lengthened  single-error  correcting 
Reed  Solomon  code  (see  also  [8,  p.  323]). 

3.3  Block-Parity  Code 

Suppose  that  r  =  0  and  to  =  s.  Thus,  by  Remark  2.1, 
Ci  =  {0s},  Qn--2  =  U\,  Pi  =  0  (because  r  =  0),  Y\  =  0,  and 
U 2  =  0  (because  to  =  s).  Then, 

n— 3  n— 2 

y2  =  y  =  YjQ‘  +  u'  =  Qi- 

i= 0  i—0 

From  (7)  of  Algorithm  la,  we  have  P2  =  Y  =  Qi-  Thus, 
the  resulting  code  C  is  the  block-parity  code  presented  in  [4]. 

3.4  Cyclic  Redundancy  Code  (CRC) 

Consider  an  h- bit  CRC  that  is  q  bits  long  and  is  generated  by  a 
polynomial  M.  Suppose  that  q  and  h  can  be  written  as  q  = 
x  +  (n  —  l)s  and  h  =  as  +  b,  where  n  >  1,  0  <  x  <  s,  a  >  0, 
and  0  <  b  <  s  (see  Appendix  B,  which  can  be  found  on  the 
Computer  Society  Digital  Library  at  http:/ /computer.org/ 
tc/archives.htm).  Then,  it  is  shown  in  Remark  B1  that  the 
CRC  check  tuple  is 

(n—a—2  \ 

J2  QiW, i  +  UiXr  mod  M, 

where  Wt  =  X (n-1-!)s  mod  M,  i  =  0, 1, . . . ,  n  —  a  —  2.  Further, 
we  show  in  Remark  B1  that  the  weighting  polynomials  Wt 
are  distinct  and  Wi  0, 1,  Xs, . . . ,  Xas,  provided  that  q  < 
2h~ 1  —  1  and  M  is  the  product  of  (X  +  1)  and  a  primitive 
polynomial  of  degree  h—  1. 

Now,  consider  the  code  C  that  has  the  same  length  and 
the  same  weighting  polynomials  as  the  above  CRC.  Let  r  = 
h  and  to  =  0.  Then,  P2  =  0  by  Remark  2.2  and  Pi  =  P  by 
Algorithm  la  (if  r  <  s)  or  by  Algorithm  lb  (if  r  >  s).  Thus, 
this  particular  code  C  is  identical  to  the  above  CRC.  So,  any 
CRC  can  be  generated  by  either  Algorithm  la  or 
Algorithm  lb,  i.e.,  by  Algorithm  1. 

Remark  3.  To  construct  other  codes  (such  as  CXOR 
checksum  and  nonbinary  Hamming  codes),  we  need  to 
modify  (3)  by  deleting  Qn_ 2  from  the  summation,  but  (2) 
remains  unchanged.  That  is,  (3)  is  replaced  by 

71  —  3  \ 

Qi  +  Qn-1  I  £  C\ .  (9) 

i=0  / 

Then,  Algorithm  la  remains  valid  if  we  define  Y  = 
J2Z =0  Qi  +  U2Xm  because  the  term  Q„_2  =  U\Xr  +  Pi  is 
absent  from  (9). 

3.5  CXOR  Checksum 

Suppose  now  that  we  allow  some  of  the  polynomials 
Wq,  Wi,. . . ,  Wn-i  in  (2)  to  be  repeated  and  we  use 
Algorithm  la  along  with  variation  (9).  Let  r  =  s  =  to, 
M  =  Xs  +  1,  and  W.t  =  X1  mod  M.  It  can  be  shown  that 
Wi+S  =  Wi  for  all  i  >  1,  i.e.,  some  of  the  weighting 
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polynomials  may  repeat.  Then,  C\  =  {0s}  (because  m  =  s), 
U\  =  0  (because  r  =  s),  and  U2  =  Y\  =  0  (because  m  =  s). 
From  (6)  and  (7),  we  have  P\  =  YZIZq  QiX'  mod  (Xs  +  1) 
and  P2  =  Y2  =  Y  =  Yi=o  Qi  (see  Remark  2.1  and 
Remark  3).  Thus,  the  resulting  code  C  is  the  CXOR 
checksum  presented  in  [4]. 

3.6  Nonbinary  Perfect  Code 

Suppose  that  Algorithm  la  along  with  variation  (9)  is 
applied  with  r  =  s  =  m  and  n  =  2m  +  1.  Let  M  be  a 
primitive  polynomial  of  degree  m  and  let 

Wg,Wi, . . . ,  Wn-3 

be  distinct  and  nonzero  polynomials.  Then,  C\  =  {0s}, 
Pi  =  Yl=o2  QZV,  mod  M,  and  P2  =  YLo2  Qi-  ft  then  can 
be  shown  that  and  P2  are  two  check  tuples  of  the 
nonbinary  Hamming  perfect  code  over  GF(2m)  (see  [8, 
Chapter  6]),  i.e.,  the  tuples  (Q0,  Qi,  ■  ■  . ,  Qy«-2,  Pi,  Pi)  form 
the  codewords  of  the  Hamming  perfect  code  over  GF(2m). 

3.7  One’s-Complement  Checksum  and  Fletcher 
Checksum 

The  above  codes  are  constructed  using  polynomial  arith¬ 
metic  because  each  tuple  is  considered  as  a  polynomial  over 
the  binary  field  {0,1}-  An  alternative  is  to  consider  each 
tuple  as  an  integer  and  to  use  the  rules  of  (one's- 
complement)  integer  arithmetic  to  manipulate  the  code 
construction.  If  we  apply  the  integer  arithmetic  to  the 
construction  of  the  block-parity  code  and  to  the  nonbinary 
perfect  code,  we  will  get  the  one's-complement  checksum 
and  Fletcher  checksum,  respectively.  However,  these 
integer-based  codes  are  often  weaker  than  their  binary 
polynomial  counterparts  (see  Fig.  1).  See  [4],  [5],  [9]  for 
definitions  and  performance  comparisons  of  error-detection 
codes,  including  the  one's-complement  and  Fletcher  check¬ 
sums.  Thus,  the  integer-arithmetic  version  of  Algorithm  la, 
along  with  variation  (9),  also  produces  the  one's-comple¬ 
ment  and  Fletcher  checksums.  We  will  not  discuss  these 
checksums  and  integer-based  codes  any  further  because 
they  are  often  weaker  than  their  polynomial  counterparts 
and  their  analyses  can  be  found  elsewhere  (e.g.,  [5],  [14]). 

3.8  Other  Error-Detection  Codes 

Recall  from  Algorithms  la  and  lb  that  the  (ns,  ns  —  r  — 
m,  dc)  code  C  is  constructed  from  an  (s,  s  —  m,  d\ )  code  C\ . 
Thus,  by  varying  C\ ,  different  codes  C  are  produced. 
Further,  C  is  nonlinear  if  C\  is  nonlinear.  Thus  far,  the  codes 
C  are  constructed  from  the  codes  C'i  that  are  either 
extended  perfect  codes  or  trivial  codes  {0s}  and  (s,s,  1). 
Now,  we  construct  the  codes  C  from  the  codes  C\  that  are 
neither  perfect  nor  trivial.  In  both  instances  below,  we 
assume  that  s  =  r  +  m  =  16,  n  =  2r  with  r  =  7  or  8,  and 
d\  =  6,  so  that  dc  =  4  by  Theorem  1.2. 

1.  Suppose  that  C'i  is  the  extended  (16,  7, 6)  linear  BCH 
code  (see  [8],  Chapter  3)  and  r  =  7.  Then,  ns  =  2 rs  = 
2,048  and  the  resulting  code  C  is  a  (2,048, 2,032, 4) 
linear  code. 

2.  Suppose  that  C'i  is  the  extended  (16,8,6)  nonlinear 
Nordstrom-Robinson  code  (see  [8,  p.  73])  and 
r  =  8.  Then,  ns  =  2rs  =  4,096,  and  C  is  a 


(4,096, 4,080, 4)  nonlinear  code  that  is  twice  as 
long  as  the  linear  code  in  1. 

4  Fast  Implementation  of  Error-Detection 
Codes 

Recall  from  Algorithm  1  that  r  is  the  degree  of  the 
modulating  polynomial  M  and  s  is  the  number  of  bits 
contained  in  each  tuple  Qt.  Algorithm  1  produces  a  large 
family  of  error-detection  codes  because  its  systematic 
versions  (either  Algorithm  la  when  r  <  s  or  Algorithm  lb 
when  r  >  s)  generate  all  the  codes  presented  in  Section  3.  So 
far,  the  discussion  is  abstract  and  general  to  facilitate  the 
development  of  mathematical  properties  of  our  algorithms. 
In  this  section,  we  focus  on  the  practical  aspect  of  these 
algorithms,  i.e.,  we  now  discuss  how  some  codes  generated 
by  these  algorithms  can  be  efficiently  implemented  in 
software.  Then,  we  compare  the  complexity  of  our 
algorithms  with  that  of  the  CRC  algorithm  (the  strongest 
code  in  Fig.  1).  In  theory,  the  fundamental  unit  for  digital 
data  is  bit.  In  practice,  however,  communication  protocols 
and  computers  often  process  data  as  blocks  of  bits  or  tuples 
(e.g.,  bytes  or  words)  and  not  as  individual  bits  at  a  time. 
For  example,  on  familiar  32-bit  computers,  the  modulo-2 
addition  of  two  32-bit  numbers  can  be  accomplished  by  a 
single  XOR  operation  (using  C  programming  language). 
Thus,  efficient  error-detection  codes  should  also  be  pro¬ 
cessed  in  terms  of  tuples  at  a  time,  i.e.,  each  «s-bit  codeword 
is  expressed  in  terms  of  n  tuples,  s  bits  each. 

In  parallel  to  Algorithm  la  and  Algorithm  lb,  now  we 
develop  two  fast  algorithms:  Algorithm  2a  for  r  <  s  and 
Algorithm  2b  for  r  >  s.  Although  Algorithms  la  and  lb  can 
produce  CRCs  and  many  other  codes  (see  Section  3),  the 
two  fast  algorithms  produce  only  non-CRC  codes  that  are 
shown  later  in  Section  4.1  to  be  faster  than  CRCs  by  the 
factor  O(s). 

Now,  suppose  that  information  tuples 

(Qo,  Q i,  ■  ■  •  i  Qn- 3,  Ui,  U2) 

are  given.  Let  r,  s,  and  m  be  such  that  r  <  s  and  n  <  2r. 
Assume  that  each  Qi  is  s  bits,  U\  is  s  —  r  bits,  and  U2  is 
s  —  m  bits.  From  the  following  algorithm,  we  can  compute 
the  two  check  tuples  P\  and  P2  that  are  appended  to  the 
information  tuples  such  that  the  resulting  code  C  has 
minimum  distance  dc  =  4. 

Algorithm  2a.  Let  r  <  s  and  n  <  2r.  Let  M  be  a  primitive 
polynomial  of  degree  r  and  let  F  be  the  encoder  of  an  (s,  s  — 
m,  df)  code  with  d\  >  4.  Then,  the  resulting  code  C  is  an 
(ns,  ns  —  r  —  m,  4)  code  and  each  of  its  codewords  is 
(Qo,Qi,---,Qn-3,Ui,U2,Pi,P2).  The  two  check  tuples  are 
computed  by 

Pi  =  (Z  +  U\Xr)  mod  M  and  P2  =  Y2  +  F(Yf), 

where  Z  =  YiZii  QiXn~2~l  mod  (MXs~r).  The  tuples  Y\  and 
Y2  are  defined  as  in  Algorithm  la,  i.e.,  they  satisfy 

{Yu  Y2)  =  Y,Xm  +  Y2  =  Y  =  YiZo  Qi  +  UiXr  +  Pi  +  u2xm. 

Proof.  Define  Wi  =  Xn~'2~l  mod  M,  i  =  0,l,...,n  —  3.  Then, 
Wt  f  0,1  and  Wo .  W\ , ,  Wns  are  distinct  because  M  is 
a  primitive  polynomial  of  degree  r  and  n  <  2r.  Let  C\  be 
the  (s,s  —  m,di)  code  with  the  encoder  F.  Now,  using 
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these  Wi  and  C\  in  Algorithm  la,  we  can  construct  the 
code  C  whose  two  check  tuples  are  given  by 

71—3  \ 

QiWi  +  UxXr  mod  M  and  P2  =  Y2  +  F(Y i). 

7—0  / 

Because  di  >  4,  dc  =  4  by  Theorem  1.2.  Next,  the  new 
form  of  P\  is  derived  as  follows:  First,  note  that 

Xs~rWi  =  (Xa-rXn-2~i)  mod  (MXS~T).  (10) 
Multiplying  Pi  by  Xs~r ,  we  have 

n-3  \ 

^2  QiWi  +  UiXr  Xs~r  mod  (MXs~r).  (11) 

7=0  / 

From  (10),  (11),  the  definition  of  Z,  and  some 
modular  algebra  manipulation,  it  can  be  shown  that 

P1Xs~r  =  (Xs~rZ  +  UiXr Xs~r)  mod  (MXs~r).  Thus, 

Pu ^s_r  =  ((Z  +  UiXr)  mod  M)Xs~r.  (12) 

From  (12),  we  have  P\  =  (Z  +  U\Xr)  mod  M.  □ 

Remark  4.  Let  I  and  J  be  two  binary  polynomials  of 
degrees  i  and  j,  respectively.  In  general,  it  is  rather 
tedious  to  compute  I  mod  J.  Flowever,  when  i  <  j,  the 
computation  becomes  easy  and  is  accomplished  in 
constant  time  because 


Fig.  3.  Pseudocode  for  Algorithm  2a.  Here,  Qt,  Ui,  and  Uo  are  input 
information  tuples,  P  is  the  output  check  tuple. 

Remark  5.  Algorithm  2b,  which  will  be  discussed 
shortly,  requires  operations  on  new  tuples 
Qq  ,  Q\ , . . . ,  Q*  _ 3  that  are  defined  from  the  original 
tuples  Q0lQi, . . .  ,Q„-a-3  as  follows:  First,  let 
U  =  {0, 1, . . . ,  n  —  4,  n  —  3},  then  we  partition  the  set  U 
into  four  sets  P,  Q,  X,  and  Y: 


I  mod  J  = 


I  if  i  <  j 
I  +  J  if  i  =  j. 


This  simple  fact  is  used  to  efficiently  compute  Z  = 
QiXn~2~‘  mod  (MXs~r)  in  Algorithm  2a,  as  follows: 
Using  Horner's  rule,  we  have 


Z  =  (...  {(Q0X)  mod  N  +  Qi)X  mod  N+...  +  Qn_3)  mod  N, 

where  N  =  MXs~r.  Then,  Z  can  be  recursively 
computed  from  the  polynomials  I)  defined  by  T0  = 
Qo  and  7)  =  (T)_iX)  mod  N  +  Qj,  i  =  1,,* . . ,  n  —  3. 
Because  s  =  deg  N  >  deg(7)_iX),  each  T)  is  computed 
in  constant  time,  i.e.,  with  0(1)  complexity.  Finally,  we 
have  Z  =  (T„_3X)  mod  N.  Thus,  Z  has  computational 
complexity  O  (n).  Horner's  rule  is  also  used  to  efficiently 
encode  the  WSC  [4],  [9]. 

Fig.  3  shows  a  simple  software  implementation  of 
Algorithm  2a.  The  input  data  are  (Qo,  Qi, . . . ,  Qn_3,  U\,U2). 
The  output  is  the  check  tuple  P  =  (Pi,  P2).  The  "for"  loop  is 
used  to  compute  both  Y  and  T.  Computation  of  Y  requires 
only  one  XOR  operation,  while  T  can  be  efficiently 
computed  via  Remark  4  because  deg  A  >  deg(TLY).  Then, 
the  final  value  of  T  is  used  to  compute  Z,  i.e., 
Z  =  (TX)  mod  N. 

Now,  we  consider  a  fast  version  of  Algorithm  lb  for 
constructing  the  code  C.  In  this  case,  r  >  s,  i.e.,  r  =  as  +  b, 
where  a  >  1  and  0  <  b  <  s.  Assume  that  information 
tuples  (Qo,Qi,...,Qn-a-3,Ui,U2)  are  given,  where  each 
Qi  is  s  bits,  Ui  is  s  —  b  bits,  and  U2  is  s  —  m  bits.  We  wish 
to  append  two  checks  tuples  P\  and  P>  to  the  information 
tuples  so  that  (Qo,  Qu  ■  ■  ■ ,  Qn-a-3,  Uu  U2,  Pi,  P2)  is  a  code¬ 
word  of  the  code  C.  Before  stating  the  algorithm,  we 
need  some  preliminary  results. 


P  =  {i:0<i<n  —  3  —  a,  i  =  n  —  2  —  js 
for  some  j,  1  <  j  <  a}, 

Q  =  {i:0<i<n  —  3  —  a,  i  ^  n  —  2  —  js 
for  all  j,  1  <  j  <  o}, 

X  =  {i:n  —  3  —  a  <  i  <  n  —  3,i  =  n  —  2  —  js 
for  some  j ,  1  <  j  <  a}, 

Y  =  {i  :  n  —  3  —  a  <  i  <  n  —  3,i  ^  n  —  2  —  js 
for  all  j,  1  <  j  <  a}. 

Because  |P|  +  |X|  <  a  and  a  =  |X|  +  |Y|,  we  have 
|P|  <  |Y|.  Let  p  =  |P|,  then  Y  has  at  least  p  elements. 
So,  let  Y*  be  the  set  of  p  smallest  elements  of  Y,  i.e., 
Y*  =  {fi,f2, ... ,  fp}-  Similarly,  we  can  write 
P  =  {ei,e2, . . .  ,ep}.  Finally,  we  can  define  the  new  tuples 
QI-  3  from  Qo,  Qi,  •  •  ■ ,  Qn-a-3  as  follows: 

To  if  i  e  P  u  x 
Q*  =  l  Qi  if  *  e  Q 

[o  if  p  <  i  <  n  —  3, 
and  QJ.  =  Qei  if  1  <  i  <  p. 

Remark  6.  Now,  assume  that  s  >  a  +  1  and  n  >  2  +  as, 
we  will  show  that  these  conditions  will  simplify 
the  definition  of  Q*  (given  in  Remark  5).  It  can  be 
shown  from  these  conditions  that  0  <  n  —  2  —  js  < 
n  —  3  —  a  for  all  1  <  J  <  a-  Then,  from  Remark  5, 
we  have  p  =  |P|  =  a,  |X|  =  0,  |Y|  =  a,  and  Y*  =  Y. 
We  also  have  Y*  =  {n  —  2  —  j,j  =  a,a  —  1, . . .  ,1}  and 
P  =  {n  —  2  —  js,  j  =  a,  a  —  1, . . . ,  1}.  Note  that  fi  =  n  — 
2  —  (a  +  1  —  i)  and  ei  =  n  —  2  —  (a  +  1  —  i)s,  1  <  i  <  a. 
Thus,  Qf.  Qei  iff  Qn_ 2_(a3_i_q  Qn—2—(a+i—i)s  iff 
Qn-i-js  —  Qn-i-js ■  Finally,  from  Remark  5,  we  have 
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Q*  =  Qi  if  *  /  n  —  2  —  js,  1  <  j  <  a,  0  <  i  <  n  —  3  —  a, 
Qn—2—js  =  °>  1  <  j  <  a-  and 
Qn—2—j  Qn-2-jsi  1  —  j  ~  a- 

Basically,  QI,  Q\, . . . ,  <5*_3  are  obtained  by  moving 
some  o  tuples  of  Qo,Qi,  ■  ■  ■  ,Qn-a-3  to  the  right  and 
then  by  filling  the  a  removed  tuples  by  zeros.  Now, 
define  Z  =  Ya=o  Q*Xn~2~l  mod  M,  which  is  a  key 
quantity  in  the  following  algorithm.  This  simplified 
definition  of  Q*  makes  it  possible  to  calculate  Z  directly 
from  Qi  (i.e.,  without  using  Q*).  That  is,  we  first  modify 
Qi  using  the  following  pseudocode: 

for(l  <  j  <  a) Qn-2-j  =  Qn-2-js  ; 
f°r(l  <  j  <  a)  Qn_2-js  =  0; 

then  we  can  compute  Z  directly  from  the  modified  Qi  as 
z  =  EE?  QiXn-2-1  mod  M. 

Now,  we  have  the  following  algorithm  that  is  proven  in 
Appendix  A  (which  can  be  found  on  the  Computer  Society 
Digital  Library  at  http:/ /computer. org/tc/archives.htm). 

Algorithm  2b.  Suppose  that  r  >  s  and  n  <  2r.  Let  M  be  a 
primitive  polynomial  of  degree  r  and  let  F  be  the  encoder  of 
an  (s,s  —  m,di)  code  with  d\  >  4.  Then,  the  two  check 
tuples  of  the  code  C  are  computed  by 

Pl  =  {Z+  UiXr)  mod  M  and  P2  =  Y2  +  F(Yi), 

where  Z  =  Ed  Q*Xn~2~‘‘  mod  M  and  Q*  are  defined  in 
Remark  5  (or  in  Remark  6  if  applicable).  The  tuples  1/  and 
Y2  are  defined  as  in  Algorithm  lb,  i.e.,  they  satisfy 

(Y1,Y2)  =  Y1Xm  +  Y2  =  Y 

Q, )  +  ( UiXb  +  P10 )  +  (  Pli )  +  U2Xm, 
i= 0  /  \i=l  J 

where  P\u  is  a  b-tuple,  and  Pn, . . .  ,P2a  are  s-tuples  that 
satisfy  P\  =  (Pio,  Pn, . . . ,  Pia).  Further,  C  is  an  ( ns,  ns  — 
r  —  m,  4)  code. 

Fig.  4  shows  a  software  implementation  of 
Algorithm  2b  under  the  assumption  that  s  >  a  +  1  and 
n  >  2  +  as  as  required  in  Remark  6.  The  input  data  are 
Qo,Qi,---,Qn-a-3,Ui,U2.  The  output  is  the  check  tuple 
P=(Pi,P2).  Note  that,  as  in  Algorithm  2a  (see 
Remark  4),  the  tuple  Z  in  Algorithm  2b  can  also  be 
computed  in  time  O(n). 

Example  3.  Here,  we  construct  the  code  C  for  the  case  r  >  s, 
with  s  =  8  and  r  =  12.  Let  m  =  4,  then  the  total  number 
of  check  bits  is  h  =  r  +  m  =  16.  Because  r  >  s,  we  can 
write  r  =  as  +  b  with  a=l  and  6  =  4.  Let 
{Qo,  Qi,  ■  ■  ■ ,  Qn-i,  Ui,  U2)  be  information  tuples,  where 
each  Qi  is  an  8-tuple,  U\  and  U2  are  4-tuples,  which  can 
be  combined  into  a  single  8-tuple  (Ui,U2)-  We  wish  to 
append  a  check  16-tuple  [P\ ,  P2)  to  the  information 
tuples  so  that  (Q0,Qi, . . .  ,Qn-4,(U1,U2),P1,P2)  forms  a 
codeword  of  the  code  C,  which  is  an  (8 n,  8 n  —  16, 4) 
code.  Here,  we  let  F  be  the  encoder  of  the  (8,4,4) 
extended  Hamming  code.  The  resulting  code  C  can  have 
length  up  to  2ft_1  =  215  bits  (see  Section  3.1). 


for (1  <  i<  a)  Qn-2-i=Qn-2-iS; 
f°r (1  ^  is  a)  Qn_2_is=0; 

T  =  Q0;Y  =  Q0; 

for(l  <  i  <  n  -  3) 

{ 

T  =  (TX)  mod  M  +  Q  i; 

Y  =  Y  +  Qi; 

Z  =  (TX)  mod  M; 

P,  =(Z+U1Xr)modM; 

Pl  =(P|0,P|1,  ...,Pla); 

P2  =  P,0  +  U1Xb; 

for  ( 1  <  i  <  a)  P2  =  P2  +  P 1  ii 

Y  =  Y+P2+U2Xm  =Y1Xm  +  Y2; 

P2  =  Y2  +  F(Y|); 

P  =  PIXm  +  P2; 


Fig.  4.  Pseudocode  for  Algorithm  2b.  Here,  Qt,  Ui,  and  U2  are  input 
information  tuples,  P  is  the  output  check  tuple. 

In  this  example,  a  =  1,  s  =  8,  and  n  is  the  total  number 
of  bytes  in  a  codeword  of  the  code  C.  If  we  assume 
further  that  n  >  10,  then  the  hypotheses  of  Remark  6  are 
satisfied,  i.e.,  s  >  a  +  1  and  n  >  2  +  as.  Thus,  by  Remark 
6,  we  can  modify  the  Qi  by  first  setting  3  =  Q,j-io  and 
then  setting  Q„_io  =  0.  Then,  the  modified  information 
tuples  are 

(Qu  1  Ql  j  •  •  •  ^  Qn—ll  1  Qn— 10:  Qn— 9  j  •  ■  -  j  Qn— 4:  Qn—Si  (H r ,  U[ 2))  • 

Then,  as  in  Algorithm  2a,  we  can  efficiently  compute 
the  quantity  Z  =  EEo*  QiXn~2~l  mod  M,  which  is 
shown  in  Fig.  4. 

Remark  7. 

1 .  Given  r  and  s,  either  Algorithm  2a  or  Algorithm  2b 
can  be  used  to  construct  the  code  C  that  is  ns 
bits  long,  where  1  <  n  <  2r .  The  values  of  r  and  s 
can  be  as  small  as  0  and  1,  respectively.  However, 
the  resulting  code  C  can  be  trivial,  e.g.,  if  r  =  0, 
then  n  =  1  and  C  =  C\ .  If  r  =  0,  s  =  1,  and 
C,  =  {0},  then  C  =  Ci  =  {0}.  If  .s  =  1,  6’,  =  {0}, 
r  =  1,  and  n  =  2r  =  2,  then  C  =  {(0,0)}.  How¬ 
ever,  when  s  =  1,  C\  =  {0},  r  >  2,  and  n  >  4,  the 
resulting  code  C  can  be  nontrivial  and  each 
codeword  of  C  now  has  ns  =  n  bits.  In  particular, 
from  Algorithm  2b,  it  can  be  shown  that  the  two 
check  tuples  of  an  n-bit  codeword  are 

n—a— 3  a 

Pi  =  Z  and  P2=  ^2  Qi  +  ^2 Pl i’ 

i= 0  i= 0 

i.e.,  P2  is  the  even  parity  bit  computed  from  the  first 
n  —  1  bits  of  the  codeword  of  C.  For  example,  if 
r  =  2  and  n  =  2r  =  4,  then  C  is  the  (4, 1,4)  repeti¬ 
tion  code.  This  (4,1,4)  code  is  also  constructed 
from  Algorithm  2a  with  r  =  1,  n  =  2r  =  2,  s  =  2, 
and  Ci  =  {(0,  0)}. 
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2.  Let  r  =  1,  then  M  =  X  +  1.  Thus,  the  code  C  is 
2s  bits  long  (by  Algorithm  2a)  and 
Pi  =  (U\X)  mod  ( X  +  1),  which  is  the  even  parity 
of  U\.  For  example,  let  C\  be  the  (4, 1, 4)  code,  then 
we  can  construct  the  code  C  of  length  8,  which  is 
the  (8,4,4)  extended  Hamming  code.  If  we  set 
Ci  =  (8, 4, 4)  code,  then  we  can  construct  the  code 
C  of  length  16,  i.e.,  C  =  (16, 11, 4)  code.  Repeating 
this  process,  we  can  construct  (32, 26, 4)  and 
(64, 57, 4)  codes.  This  method  is  related  to  the 
scheme  of  [15]  and  is  effective  to  construct  codes 
that  are  small  enough  to  fit  into  the  computer 
words. 

3.  Let  r  >  0,  s  >  1,  C\  =  (s,s  —  m,  d\)  with  di  >  4, 
and  h  =  r  +  m.  Then,  using  either  Algorithm  2a 
(if  r  <  s)  or  Algorithm  2b  (if  r  >  s),  we  can 
construct  the  code  C  that  is  an  (ns,  ns  —  h,  4) 
code.  In  particular,  if  n  =  2r,  then  C  is  a  (2 rs,  2 rs  — 
h,  4)  code.  That  is,  starting  from  a  code  Cl  of 
length  s,  we  can  construct  the  code  C  of  length 
2 rs.  Further,  if  Ci  is  a  (2m~1, 2m~1  —  m,  4)  ex¬ 
tended  perfect  code,  then  C  is  a  (2h~1,2h~1  —  h,  4) 
extended  perfect  code.  If  Ci  is  a  linear  perfect 
code,  then  C  is  also  a  linear  perfect  code.  This 
linear  perfect  code  C  and  the  extended  Hamming 
perfect  code  of  length  2h~l  are  equivalent,  i.e.,  one 
code  can  be  obtained  from  the  other  code  by 
reordering  the  bit  positions  and  adding  a  constant 
vector  (see  [8,  p.  39]).  Equivalent  codes  have  the 
same  minimum  distance  and  length,  but  their 
implementation  complexity  can  be  very  different. 
However,  our  algorithms  can  also  generate  fast 
codes  that  are  different  from  the  perfect  codes. 
For  example,  in  Algorithm  2a,  let  s  =  16  and  let  F 
be  the  encoder  of  the  extended  (16, 8, 6)  nonlinear 
Nordstrom-Robinson  code  (see  also  Section  3.8). 
Then,  the  resulting  code  C  is  a  nonlinear  code 
with  dc  =  4,  which  is  not  equivalent  to  any 
extended  perfect  codes. 

4.1  Software  Complexity 

Now,  we  compare  software  complexity  between  the  code  C 
and  the  CRC  (the  strongest  code  in  Fig.  1).  Here,  we  focus 
on  implementations  that  require  no  table  lookup.  Table- 
lookup  methods  are  discussed  later  in  Remark  8.2 

Suppose  that  s  >  r.  Then,  the  binary  code  C  of  length 
ns  bits  can  be  constructed  using  Algorithm  2a  whose 
complexity  is  dominated  by  the  computation  of  Z  and  Y, 
which  can  be  computed  by  the  for-loop  in  Fig.  3.  Within  this 
for-loop,  the  expression  T  =  (TX)  mod  N  +  Q,  is  computed 
in  constant  time  (by  Remark  4),  while  the  expression  Y  = 
Y  +  Qi  is  computed  by  one  XOR  operation.  Thus,  this  for- 
loop  has  complexity  O(n).  Hence,  the  time  complexity  of 
the  code  C  is  also  O(n).  Similarly,  when  s  <  r,  the  code  C 
under  Algorithm  2b  also  has  time  complexity  O(n)  (see 
Fig.  4).  In  summary,  regardless  of  s  >  r  or  s  <  r,  the  code  C 
of  length  ns  can  be  encoded  with  time  complexity  O(n). 

Now,  consider  the  CRC  that  also  has  length  ns  bits.  Here, 
we  limit  our  discussions  to  a  generic  CRC  algorithm,  i.e.,  a 
general  algorithm  that  is  applicable  to  all  generating 


polynomials.  Then,  it  is  shown  in  Remark  B3(a)  that  the 
generic  CRC  algorithm  has  time  complexity  O (ns).  For 
some  specific  generating  polynomials  whose  nonzero  terms 
satisfy  certain  desirable  properties,  alternative  algorithms 
(such  as  shift  and  add  [4]  and  on-the-fly  [11])  may  have 
lower  complexity. 

When  s  is  considered  as  a  constant,  we  have 
O (ns)  =  O(n).  Thus,  from  a  purely  theoretical  viewpoint, 
both  the  CRC  and  the  code  C  have  the  same  level  of 
complexity.  However,  the  extra  factor  s  does  not  appear  in 
the  time  complexity  of  the  code  C,  i.e.,  the  code  C  is 
approximately  faster  than  the  CRC  by  the  factor  O (s).  We 
will  show  later,  in  Remark  8.1,  that  O(s)  «  0.73s  when  these 
error-detection  codes  are  implemented  in  C  programming 
language. 

Example  4.  Here,  we  study  codes  of  h  =  16  check  bits  (other 
values  of  h  are  discussed  later  in  Remark  8.1).  Assume 
that  Ci  is  the  (s,  s  —  m,  4)  extended  Hamming  code  and 
the  resulting  code  C  is  constructed  by  Algorithm  2a  or 
Algorithm  2b.  Thus,  both  the  CRC  and  the  code  C  have 
minimum  distance  d  =  4  and  the  maximum  code  lengths 
of  the  code  C  and  of  the  CRC  are  215  and  215  —  1  w  215 
bits,  respectively  (see  also  Remark  7.3).  Thus,  in  terms  of 
the  minimum  distance  and  maximum  code  length,  the 
code  C  and  the  CRC  perform  almost  identically.  Our 
goal  here  is  to  compare  the  software  complexity  of  these 
two  codes.  Software  complexity  refers  to  the  number  of 
software  operations  to  process  one  byte  of  a  codeword. 
Here,  a  code  is  called  "faster"  if  it  has  lower  operation 
count.  Simply  stated,  we  write  software  programs  (in 
C  programming  language)  for  the  code  C  and  the  CRC. 
Then,  we  count  the  number  of  software  operations 
needed  by  each  code  to  encode  one  byte  of  a  codeword. 
Computer  programs  for  these  codes  and  the  rules  for 
counting  the  operations  are  given  in  Appendix  D  (which 
can  be  found  on  the  Computer  Society  Digital  Library  at 
http:  / /computer. org/ tc/archives.htm). 

Recall  that  a  typical  codeword  consists  of  n  tuples, 
each  tuple  has  s  bits.  Let  tc(s,n)  and  tcRc(s,n)  be  the 
software  operation  count  required  to  compute  the  h  =  16 
check  bits  for  a  codeword  of  the  code  C  and  of  the  CRC, 
respectively.  Then,  from  (29)  of  Appendix  D  (which  can 
be  found  on  the  Computer  Society  Digital  Library  at 
http:/ /computer. org/ tc/archives.htm),  we  have 

tc(s,n)  =  7.5  n  +  f(s), 

where  /( 8)  =  33.5,  /(16)  =  51,  /( 32)  =  165.5,  and 

/(64)  =  372.  From  Algorithms  2a  and  2b,  the  two  check 
tuples  are  given  by  P\  =  (Z  +  U\Xr)  mod  M  and 
Pi  =  Y-2  +  F(Yi).  The  first  component  of  tc(s,ri)  is  7.5 n 
and  represents  the  cost  of  computing  Z  and  Y  =  (Yi,Y2), 
while  the  second  component  f(s)  is  the  cost  of  comput¬ 
ing  (Z  +  U] Xr)  mod  M  and  Y2  +  F(Yi).  The  first  compo¬ 
nent  varies  as  a  linear  function  of  the  tuple  count  n,  while 
the  second  component  f(s)  depends  only  on  the  tuple 
size  s  and  not  on  n.  Thus,  f(s)  is  a  transient  component 
whose  contribution  becomes  negligible  for  large  n. 

For  the  CRC,  from  (30)  of  Appendix  D  (which  can  be 
found  on  the  Computer  Society  Digital  Library  at  http: /  / 
computer.org/tc/archives.htm),  we  have 
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Fig.  5.  Operation  count  per  byte  of  the  CRC,  operation  count  per  byte  of 
the  code  C,  and  the  ratio  of  the  above  two  numbers. 

tcRc(s ,  n)  =  5.5 ns  +  3 n  -  g{s), 

where  g( 8)  =  52,  g(16)  =  93,  g{ 32)  =  ry(64)  =  90.  For 
example,  let  s  =  8  and  n  =  64,  i.e.,  ns  =  29  =  512  bits. 
Then,  tc(8, 64)  =  (7.5) (64)  +  33.5  =  513.5,  i.e.,  the  code  C 
needs  513.5  operations  to  process  512  bits.  Thus,  the 
operation  count  per  byte  of  the  code  C  is 
(8)(513.5)/512  =  8.02.  Similarly,  it  can  be  shown  that 
the  operation  count  per  byte  of  the  CRC  is  46.2.  Then,  the 
ratio  of  the  byte  operation  counts  of  the  CRC  and  the 
code  C  is  46.2/8.02  =  5.76,  i.e.,  the  code  C  is  5.76  times 
faster  than  the  CRC.  The  triplet  (46.2,8.02,5.76)  for  the 
pair  ( s,ns )  =  (8,29)  is  recorded  in  the  left  top  part  of 
Fig.  5.  Triplets  for  other  pairs  (s,  ns)  are  similarly 
obtained. 

The  results  for  software  complexity  of  these  two  codes 
are  summarized  in  Fig.  5,  where  n  is  the  total  number  of 
s-tuples  in  a  codeword,  i.e.,  the  total  codeword  length  is 
ns  bits.  Here,  we  consider  a  wide  range  of  codeword 
lengths:  from  29  to  215  bits  (i.e.,  from  64  to  4,096  bytes). 
Each  cell  has  three  numbers:  The  first  number  is  the 
operation  count  per  byte  of  the  CRC,  the  second  number 
is  the  operation  count  per  byte  of  the  code  C,  the  third 
number  is  the  ratio  of  the  above  two  numbers  and 
represents  the  speed  improvement  of  the  code  C 
compared  to  the  CRC. 

From  Fig.  5,  as  expected,  the  byte  operation  count  of 
the  CRC  slightly  decreases  when  s  increases  because 
processing  of  larger  tuples  reduces  loop  overhead.  The 
CRC's  operation  count  also  slightly  decreases  with 
decreasing  n  due  to  the  negative  term  —g(s)  in 
tcRc(s,n).  Note  that  the  operation  count  of  the  CRC 
varies  only  slightly  over  a  wide  range  of  the  tuple  size  s 
and  of  the  codeword  length  ns.  In  contrast,  the  operation 
count  of  the  code  C  varies  much  more  as  a  function  of  s 
and  ns.  Further,  for  each  tuple  size  s,  the  code  C  is  faster 
for  longer  codeword  length  ns.  This  is  desirable  because 
speed  is  more  important  for  longer  messages.  The  reason 
for  the  speed  variation  of  the  code  C  is  the  contribution 
from  the  transient  term  f(s)  to  the  code  overall  speed. 
This  contribution  is  noticeable  (negligible)  if  the  code¬ 
words  are  short  (long).  For  smaller  tuple  size  s  (such  as 
s  =  8  and  16),  the  transient  term  is  smaller.  Thus,  the 
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Fig.  6.  Operation  count  per  byte  and  table  size  in  bytes. 

overall  speed  variation  (as  a  function  of  ns)  of  the  code  C 
is  also  smaller.  For  larger  s  (such  as  s  =  32  and  64),  the 
transient  term  is  greater,  resulting  in  more  speed 
variation  (as  a  function  of  ns)  for  the  code  C.  From 
Fig.  5,  the  code  C  is  substantially  faster  than  the  CRC, 
especially  for  the  tuple  size  s  =  32  or  64  bits  and  the  code 
length  ns  >  213  bits  =  1,024  bytes.  In  particular,  if  the 
code  length  is  ns  =  215  bits  =  4,096  bytes,  then  the  code 
C  is  23.4  and  43.1  times  faster  than  the  CRC  when  s  is  32 
and  64  bits,  respectively. 

Remark  8. 

1.  In  Example  4,  we  derive  the  operation  count 
expressions  tc(s,n)  and  tcnc(s,  n)  for  the  special 
case  h  =  16  check  bits  (when  the  codes  are 
implemented  in  C  programming  language). 
There,  we  also  assume  that  the  code  C\  used  in 
the  construction  of  the  code  C  is  the  extended 
Hamming  code  of  length  s.  No  such  C\  code  is 
needed  for  the  CRC.  However,  from  Figs.  3  and  4, 
the  same  expressions  also  hold  true  for  other 
values  of  h  and  for  other  codes  C\,  but  with 
different  transient  terms  that  are  now  denoted  as 
f{s,h,C{)  and  g(s,h)  to  reflect  the  their  depen¬ 
dency  on  s,  h,  and  C\.  Thus,  in  general,  the 
software  operation  counts  required  to  compute 
the  h  check  bits  for  a  codeword  (which  consists  of 
n  tuples,  each  tuple  is  s  bits)  of  these  two  codes 
are: 


tc(s7  n,  h,Ci)  =  7.5 n  +  f(s,  h ,  Cf) 
tcRc(s ,  n,  h)  =  5.5ns  +  3n  -  g(s,  h), 

where  the  transient  terms  f(s,h,C\)  and  g(s,h) 
are  independent  of  n  and  their  contributions 
become  negligible  when  n  is  large  enough.  Thus, 
for  large  n,  we  have 

tcRc(s,n,  h )  ~  5.5 ns  +  3 n  ~  5.5ns  _  ^ 

tc{s,n,h,C\)  7.5  n  7.5  n 

which  is  an  estimate  of  the  speed  improvement  of 
the  code  C  compared  to  the  CRC.  Again,  for  large 
n,  the  code  C  needs  approximately  7.5  operations 
to  process  one  s-tuple  or  60/s  operations  per  byte. 
Recall  that,  in  general,  the  code  C  is  faster  than 
the  CRC  by  the  factor  O(s).  Thus,  we  have  O(s)  « 
0.73s  when  these  error-detection  codes  are  im¬ 
plemented  in  C  programming  language. 

2.  In  Fig.  5,  we  show,  without  using  table  lookup, 
the  speed  performance  of  the  code  C  and  the 
CRC,  with  h  =  16  check  bits.  Now,  we  discuss 
table-lookup  implementations  for  the  same  codes. 
For  concreteness,  here  we  assume  that  each  tuple 
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Qi  has  size  s  =  8  bits,  as  is  often  used  in  table- 
lookup  implementations  of  common  CRCs.  Lar¬ 
ger  values  of  s  can  be  similarly  handled,  but  they 
result  in  much  larger  table  sizes.  The  results  are 
shown  in  Fig.  6  (whose  detailed  derivation  is 
given  in  Appendix  D.l,  which  can  be  found  on 
the  Computer  Society  Digital  Library  at  http:// 
computer.org/tc/archives.htm).  Note  that,  be¬ 
cause  s  =  8  is  a  small  value,  the  transient  terms 
f(s,  h,  Ci)  and  g(s,  h)  are  also  small  compared  to 
the  code  overall  operation  counts.  Thus,  we 
estimate  the  overall  operation  counts  by  omitting 
these  transient  terms.  In  particular,  the  second 
column  shows  that,  without  using  table  lookup, 
the  code  C  and  the  CRC  use  7.5  and  47  operations 
per  byte,  respectively.  The  exact  values,  which 
vary  from  7.51  to  8.02  (for  the  code  C)  and  from 
46.2  to  47  (for  the  CRC),  are  recorded  in  Fig.  5. 
The  estimated  operation  counts  and  table  sizes 
are  shown  in  Fig.  6.  As  expected,  the  operation 
counts  become  smaller  at  the  cost  of  larger  tables. 


5  Summary  and  Conclusions 

We  develop  Algorithm  1  for  generating  a  large  and  general 
family  of  binary  error-detection  codes.  This  algorithm  has 
two  key  parameters,  s  and  r,  where  s  is  the  size  of  each 
tuple  and  r  is  the  degree  of  the  modulating  polynomial  M. 
Algorithm  1  is  expressed  in  general  and  abstract  form  to 
facilitate  the  mathematical  development  of  the  resulting 
code  C.  Error-detection  codes  used  in  practice  are  often 
systematic.  Thus,  Algorithm  1  is  transformed  into  systema¬ 
tic  versions  to  yield  Algorithm  la  (if  r  <  s)  and  Algorithm  lb 
(if  r  >  s). 

A  variety  of  error-detection  codes  (such  as  CRCs, 
checksums,  and  other  codes  listed  in  Fig.  1)  are  developed 
over  the  years  for  applications  that  require  reliable 
communication  or  storage.  These  codes  are  traditionally 
considered  as  unrelated  and  independent  of  each  other. 
They  also  differ  considerably  in  performance  and  complex¬ 
ity.  More  complex  codes  such  as  CRCs  are  stronger  codes 
(with  minimum  distance  d  =  4),  whereas  simple  checksums 
such  as  block-parity  codes  are  weaker  codes  (with  d  =  2).  In 
Section  3,  we  show  that  all  these  diverse  codes  (from  CRCs 
to  checksums),  as  well  as  other  linear  and  nonlinear  codes, 
are  special  cases  of  Algorithm  1.  Thus,  these  seemingly 
unrelated  codes,  which  are  independently  developed  over 
many  years,  come  from  a  single  algorithm. 

From  Fig.  1,  CRCs  have  the  best  error-detection  cap¬ 
ability,  but  introduce  the  longest  encoding  delay.  In  this 
paper,  we  then  introduce  some  non-CRC  codes  that  have 
good  error-detection  capabilities  as  well  as  fast  encoding.  In 
Section  4,  we  present  Algorithm  2a  (for  r  <  s)  and 
Algorithm  2b  (for  r  >  s),  which  are  fast  versions  of 
Algorithm  la  and  Algorithm  lb,  respectively.  These  two 
fast  algorithms  produce  only  non-CRC  codes.  Further,  some 
of  these  non-CRC  codes  are  not  only  fast  but  also  reliable. 
To  achieve  the  minimum  distance  =  4  using  h  check  bits, 
CRC  length  can  be  up  to  2/l_1  —  1  bits,  while  the  length  of 
some  non-CRC  codes  can  be  up  to  2h~ 1  bits  (i.e.,  they  are 


fast  versions  of  perfect  codes).  We  compare  the  computa¬ 
tional  complexity  of  these  CRCs  and  non-CRC  codes  using 
methods  that  require  no  table  lookup.  For  long  messages, 
the  non-CRC  codes  can  be  faster  than  the  CRCs  by  the  factor 
O(s).  Further,  O(s)  ~  0.73s  when  these  codes  are  imple¬ 
mented  in  C  programming  language.  Finally,  with  the  use 
of  table  lookup,  the  operation  counts  are  reduced  at  the  cost 
of  precomputed  tables. 
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